NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Vargas: heuristic-free alignment for assessing linear and graph read aligners

https://doi.org/10.1093/bioinformatics/btaa265

Darby, Charlotte A; Gaddipati, Ravi; Schatz, Michael C; Langmead, Ben; Birol, Inanc (April 2020, Bioinformatics)

Abstract Motivation Read alignment is central to many aspects of modern genomics. Most aligners use heuristics to accelerate processing, but these heuristics can fail to find the optimal alignments of reads. Alignment accuracy is typically measured through simulated reads; however, the simulated location may not be the (only) location with the optimal alignment score. Results Vargas implements a heuristic-free algorithm guaranteed to find the highest-scoring alignment for real sequencing reads to a linear or graph genome. With semiglobal and local alignment modes and affine gap and quality-scaled mismatch penalties, it can implement the scoring functions of commonly used aligners to calculate optimal alignments. While this is computationally intensive, Vargas uses multi-core parallelization and vectorized (SIMD) instructions to make it practical to optimally align large numbers of reads, achieving a maximum speed of 456 billion cell updates per second. We demonstrate how these “gold standard” Vargas alignments can be used to improve heuristic alignment accuracy by optimizing command-line parameters in Bowtie 2, BWA-MEM, and vg to align more reads correctly. Availability and implementation Source code implemented in C ++ and compiled binary releases are available at https://github.com/langmead-lab/vargas under the MIT license. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
Bayesian structural equation modeling in multiple omics data with application to circadian genes

https://doi.org/10.1093/bioinformatics/btaa286

Maity, Arnab Kumar; Lee, Sang Chan; Mallick, Bani K; Sarkar, Tapasree Roy; Birol, Inanc (May 2020, Bioinformatics)

Abstract Motivation It is well known that the integration among different data-sources is reliable because of its potential of unveiling new functionalities of the genomic expressions, which might be dormant in a single-source analysis. Moreover, different studies have justified the more powerful analyses of multi-platform data. Toward this, in this study, we consider the circadian genes’ omics profile, such as copy number changes and RNA-sequence data along with their survival response. We develop a Bayesian structural equation modeling coupled with linear regressions and log normal accelerated failure-time regression to integrate the information between these two platforms to predict the survival of the subjects. We place conjugate priors on the regression parameters and derive the Gibbs sampler using the conditional distributions of them. Results Our extensive simulation study shows that the integrative model provides a better fit to the data than its closest competitor. The analyses of glioblastoma cancer data and the breast cancer data from TCGA, the largest genomics and transcriptomics database, support our findings. Availability and implementation The developed method is wrapped in R package available at https://github.com/MAITYA02/semmcmc. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
Parallel Clustering of Single Cell Transcriptomic Data with Split-Merge Sampling on Dirichlet Process Mixtures

https://doi.org/10.1093/bioinformatics/bty702

Duan, Tiehang; Pinto, José P; Xie, Xiaohui; Birol, Inanc (August 2018, Bioinformatics)

Full Text Available

Search for: All records